An Efficient Statistical Speech Act Type Tagging System for Speech Translation Systems

نویسندگان

Hideki Tanaka

Akio Yokoo

چکیده

This paper describes a new efficient speech act type tagging system. This system covers the tasks of (1) segmenting a turn into the optimal number of speech act units (SA units), and (2) assigning a speech act type tag (SA tag) to each SA unit. Our method is based on a theoretically clear statistical model that integrates linguistic, acoustic and situational information. We report tagging experiments on Japanese and English dialogue corpora manually labeled with SA tags. We then discuss the performance difference between the two languages. We also report on some translation experiments on positive response expressions using SA tags. 1 I n t r o d u c t i o n This paper describes a statistical speech act type tagging system that utilizes linguistic, acoustic and situational features. This work can be viewed as a study on automatic "Discourse Tagging" whose objective is to assign tags to discourse units in texts or dialogues. Discourse tagging is studied mainly from two different viewpoints, i.e., linguistic and engineering viewpoints. The work described here belongs to the latter group. More specifically, we are interested in automatically recognizing the speech act types of utterances and in applying them to speech translation systems. Several studies on discourse tagging to date have been motivated by engineering applications. The early studies by Nagata and Morimoto (1994) and Reithinger and Maier (1995) showed the possibility of predicting dialogue act tags for next utterances with statistical methods. These studies, however, presupposed properly segmented utterances, which is not a realistic assumption. In contrast to this assumption, automatic utterance segmentation (or discourse segmentation) is desired here. Discourse segmentation in linguistics, whether manual or automatic, has also received keen attention because such segmentation provides the foundation of higher discourse structures (Grosz and Sidnet, 1986). Discourse segmentation has also received keen attention from the engineering side because the natural language processing systems that follow the speech recognition system are designed to accept linguistically meaningful units (Stolcke and Shriberg, 1996). There has been a lot of research following this line such as (Stolcke and Shriberg, 1996) (Cettolo and Falavigna, 1998), to only mention a few. We can take advantage of these studies as a preprocess for tagging. In this paper, however, we propose a statistical tagging system that optimally performs segmentation and tagging at the same time. Previous studies like (Litman and Passonneau, 1995) have pointed out that the use of a multiple information source can contribute to better segmentation and tagging, and so our statistical model integrates linguistic, acoustic and situational information. The problem can be formalized as a search problem on a word graph, which can be efficiently handled by an extended dynamic programming algorithm. Actually, we can efficiently find the optimal solution without limiting the search space at all. The results of our tagging experiments involving both Japanese and English corpora indicated a high performance for Japanese but a considerably lower performance for the English corpora. This work also reports on the use of speech act type tags for translating Japanese and English positive response expressions. Positive responses quite often appear in task-oriented dialogues like those in our tasks. They are often highly ambiguous and problematic in speech translation. We will show that these expressions can be effectively translated with the help of dialogue information, which we call speech act type tags. 2 T h e P r o b l e m s In this section, we briefly explain our speech act type tags and the tagged data and then formally define the tagging problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

On the Relationship between Emotional Intelligence and Directive Speech Acts Preference

Language and emotion are two related systems in use, in that one system (emotions) impacts the performance of the other (language). Both of them share their functionality in communication. Since the nature of foreign language classrooms is ideally interactional, emotional intelligence (EI) gains importance. The aim of this study was to find out whether one's total emotional quotient and its com...

متن کامل

Real-Time Statistical Speech Translation

This research investigates the Statistical Machine Translation approaches to translate speech in real time automatically. Such systems can be used in a pipeline with speech recognition and synthesis software in order to produce a real-time voice communication system between foreigners. We obtained three main data sets from spoken proceedings that represent three different types of human speech....

متن کامل

Impact of Collaborative Output-Based Instruction on EFL Learners’ Awareness of the Speech Act of Apology

A sizeable body of research into instructed pragmatics roots from the noticing hypothesis: comparing im- plicit and explicit instruction. It is only recently that other theories, including the output hypothesis, have been researched as possible explanations of interlanguage pragmatic development. Pursuing the same line of research, the present study addressed the impact of collaborative o...

متن کامل

Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems

When automatically translating between related languages, one of the main sources of machine translation errors is the incorrect resolution of part-of-speech (PoS) ambiguities. Hidden Markov models (HMM) are the standard statistical approach to try to properly resolve such ambiguities. The usual training algorithms collect statistics from source-language texts in order to adjust the parameters ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

An Efficient Statistical Speech Act Type Tagging System for Speech Translation Systems

نویسندگان

چکیده

منابع مشابه

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

On the Relationship between Emotional Intelligence and Directive Speech Acts Preference

Real-Time Statistical Speech Translation

Impact of Collaborative Output-Based Instruction on EFL Learners’ Awareness of the Speech Act of Apology

Exploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems

عنوان ژورنال:

اشتراک گذاری